-
-
Notifications
You must be signed in to change notification settings - Fork 400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Packaging import through producers platform #8207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work !
I'll let you see my (minor) comments.
lib/ProductOpener/Import.pm
Outdated
# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units | ||
for (my $i = 1; $i <= 10; $i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units | |
for (my $i = 1; $i <= 10; $i++) { | |
# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units | |
# we currently search up to 10 components | |
$IMPORT_MAX_COMPONENTS = 10; | |
for (my $i = 1; $i <= $IMPORT_MAX_COMPONENTS; $i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
if ($data_is_complete) { | ||
# We seem to have complete data, replace existing data | ||
$product_ref->{packagings} = \@input_packagings; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$data_is_complete only tells that you have at least one complete line. Is this enough to consider complete ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merging packaging data is very tricky and very likely to generate duplicates, so if we have weights from the producer, for at least one component, I think it's better to replace the whole structure.
$empty_regexp = '(?:,|\%|;|_|°|-|\/|\\|\.|\s)*'; | ||
$unknown_regexp = 'unknown|inconnu|inconnue|non renseigné(?:e)?(?:s)?|nr|n\/r'; | ||
$not_applicable_regexp = 'n(?:\/|\\|\.|-)?a(?:\.)?|(?:not|non)(?: |-)applicable|no aplica'; | ||
$none_regexp = 'none|aucun|aucune|aucun\(e\)'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really cool.
|
||
# TODO verify images | ||
# clean csv and sto | ||
unlink $inputs_dir . "eco-score-template.xlsx.csv"; | ||
unlink $inputs_dir . "test.columns_fields.sto"; | ||
rmdir remove_tree($outputs_dir); | ||
#rmdir remove_tree($outputs_dir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shan't we remove outputs dir ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, I commented it for debugging and forgot it
@@ -770,6 +769,23 @@ | |||
"origin" : "france", | |||
"origin_en" : "france", | |||
"other_nutritional_substances_tags" : [], | |||
"packaging_materials_tags" : [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have all those changes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a new feature, I added new packaging_(shapes|materials|recycling) facets in order to be able to better see what producers will send us, and what needs to be added to the taxonomies.
create_sto_from_json($columns_fields_json, $columns_fields_file); | ||
|
||
# step3 convert file | ||
my $converted_file = $outputs_dir . "test.converted.csv"; | ||
my $conv_result; | ||
($out, $err, $conv_result) = capture_ouputs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I did use capture_outputs because I was a bit annoyed by the long logs when running test that make them really hard to exploit…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue I have with it is that if the code inside dies, then the test actually passes, and it's very hard to understand what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can fix it and keep the capture it would be cool :-)
foreach my $l (qw(en fr es)) { | ||
compare_to_expected_results( | ||
init_fields_columns_names_for_lang($l), | ||
$expected_result_dir . "/column_names_$l.json", | ||
$update_expected_results | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😎 Cool addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well it made me realize that the generated hash tables are huge, and that we probably need a better solution than generating zillions of column synonyms. Maybe do some universal synonyms normalization first.
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
Co-authored-by: Alex Garel <alex@garel.org>
…acts-server into packaging-import
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
This PR enables producers to send us detailed packaging import data through CSV / Excel files uploaded on the producers platform.
The default is for producers to send fields like "packaging 1 shape", "packaging 1 material" etc. for each packaging component, with separate columns for each component.
At least one big producer (Les Mousquetaires / Intermarché) is sending us data with multiple lines for one product (one for each packaging component), so we now have a mechanism to support this as well.
Also included changes: